The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SI(16314hit)

9981-10000hit(16314hit)

  • Robust Dependency Parsing of Spontaneous Japanese Spoken Language

    Tomohiro OHNO  Shigeki MATSUBARA  Nobuo KAWAGUCHI  Yasuyoshi INAGAKI  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    545-552

    Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.

  • Automatic Generation of Non-uniform and Context-Dependent HMMs Based on the Variational Bayesian Approach

    Takatoshi JITSUHIRO  Satoshi NAKAMURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    391-400

    We propose a new method both for automatically creating non-uniform, context-dependent HMM topologies, and selecting the number of mixture components based on the Variational Bayesian (VB) approach. Although the Maximum Likelihood (ML) criterion is generally used to create HMM topologies, it has an over-fitting problem. Recently, to avoid this problem, the VB approach has been applied to create acoustic models for speech recognition. We introduce the VB approach to the Successive State Splitting (SSS) algorithm, which can create both contextual and temporal variations for HMMs. Experimental results indicate that the proposed method can automatically create a more efficient model than the original method. We evaluated a method to increase the number of mixture components by using the VB approach and considering temporal structures. The VB approach obtained almost the same performance as the smaller number of mixture components in comparison with that obtained by using ML-based methods.

  • Automatic Scoring for Prosodic Proficiency of English Sentences Spoken by Japanese Based on Utterance Comparison

    Yoichi YAMASHITA  Keisuke KATO  Kazunori NOZAWA  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    496-501

    This paper describes techniques of scoring prosodic proficiency of English sentences spoken by Japanese. The multiple regression model predicts the prosodic proficiency using new prosodic measures based on the characteristics of Japanese novice learners of English. Prosodic measures are calculated by comparing prosodic parameters, such as F0, power and duration, of learner's and native speaker's speech. The new measures include the approximation error of the fitting line and the comparison result of prosodic parameters for a limited segment of the word boundary rather than the whole utterance. This paper reveals that the introduction of the new measures improved the correlation by 0.1 between the teachers' and automatic scores.

  • QoS Multicast Protocol for Live Streaming

    Yuthapong SOMCHIT  Aki KOBAYASHI  Katsunori YAMAOKA  Yoshinori SAKAI  

     
    PAPER-Network

      Vol:
    E88-B No:3
      Page(s):
    1128-1138

    Live streaming media are delay sensitive and have limited allowable delays. Current conventional multicast protocols do not have a loss retransmission mechanism. Even though several reliable multicast protocols with retransmission mechanisms have been proposed, the long delay and high packet loss rate make them inefficient for live streaming. This paper proposes a multicast protocol focusing on the allowable delay called the QoS Multicast for Live Streaming (QMLS) protocol. QMLS routers are placed along the multicast tree to detect and retransmit lost packets. We propose a method that enables data recovery to be done immediately after lost packets are detected by the QMLS router and a method that reduces the unnecessary packets sent to end receivers. This paper discusses the mathematical analysis of the proposed protocol and compares it with other multicast protocols. The results reveal that our protocol is more effective in live streaming. Finally, we do a simulation to evaluate its performance and study the effect of consecutive losses. The simulation reveals that consecutive losses can slightly increase losses with our protocol.

  • A Partial Norm Based Early Rejection Algorithm for Fast Motion Estimation

    Won-Gi HONG  Young-Ro KIM  Tae-Myoung OH  Sung-Jea KO  

     
    PAPER

      Vol:
    E88-A No:3
      Page(s):
    626-632

    Recently, many algorithms have been proposed for fast full search motion estimation. Among them, successive elimination algorithm (SEA) and its modified algorithms significantly speed up the performance of the full search algorithm. By introducing the inequality equation between the norm and the mean absolute difference (MAD) of two matching blocks, the SEA can successively eliminate invalid candidate blocks without any loss in estimation accuracy. In this paper, we propose a partial norm based early rejection algorithm (PNERA) for fast block motion estimation. The proposed algorithm employs the sum of partial norms from several subblocks of the block. Applying the sum of partial norms to the inequality equation, we can significantly reduce the computational complexity of the full search algorithm. In an attempt to reduce the computational load further, the modified algorithms using partial norm distortion elimination (PNDE) and subsampling methods are also proposed. Experimental results show that the proposed algorithm is about 4 to 9 times faster than the original exhaustive full search, and is about 3 to 4 times faster than the SEA.

  • An Adaptive Dynamic Buffer Management (ADBM) Approach for Input Buffers in ATM Networks

    Ricardo CITRO  Tony S. LEE  Seong-Soon JOO  Sumit GHOSH  

     
    PAPER-Switching for Communications

      Vol:
    E88-B No:3
      Page(s):
    1084-1096

    Current literature on input buffer management reveals that, in representative ATM networks under highly bursty traffic conditions, the fuzzy thresholding approach yields lower cell loss rate at the cost of lower throughput. Also, under less bursty traffic, the traditional fixed thresholding approach achieves higher throughput at the expense of higher cell loss rate. The integration of these two properties into practice is termed adaptive dynamic buffer management (ADBM) approach for input buffers and its assessment is the objective of this paper. The argument is that, given that the traffic conditions are constantly changing, to achieve efficiency during actual operation, the network control must dynamically switch, at every ATM switch, under the call processor's control, between the two input buffer management techniques, dictated by the nature of the traffic at the inputs of the corresponding switch. The need to involve the call processor marks the first effort in the literature to dynamically configure input buffer management architectures at the switch fabric level under higher level call processor control. It stems from the fact that the switch fabric operates very fast and cannot engage in complex decision making without incurring stiff penalty. To achieve this goal, the network control needs knowledge of the burstiness of the traffic at the inputs of every ATM switch. The difficulties with this need are two-fold. First, it is not always easy to obtain the traffic model and model parameters for a specific user's call. Second, even where the traffic model and the model parameters are known for a specific user's call, this knowledge is valid only at the source switch where the user interfaces with the network. At all other switches in the network, the cells of the traffic in question interact asynchronously with the cells from other traffic sources and are subject to statistical multiplexing. Thus, to obtain the exact nature of the composite traffic at the inputs of any ATM switch, is a challenge. Conceivably, one may determine the burstiness by counting the number of cells incurred at the inputs of an ATM switch over a defined time interval. The challenge posed by this proposition lies in the very definition of burstiness in that the time interval must approach, in the limit, zero or the resolution of time in the network. To address this challenge, first, a 15-node representative ATM network is modeled in an asynchronous, distributed simulator and, second, simulated on a network of workstations under realistic traffic stimuli. Third, burstiness indices are measured for the synthetic, stochastic traffic at the inputs of every ATM switch as a function of the progress of simulation for different choices of time interval values, ranging from 20,000 timesteps down to 1,000 timesteps. A timestep equals 2.73 µs. Results reveal that consistent burstiness indices are obtained for interval choices between 1,000 and 5,000 timesteps and that a burstiness index of 25, measured at 3,000 timestep interval, constitutes a reasonable and practical threshold value that distinguishes highly bursty traffic that warrants the use of the fuzzy thresholding approach from less bursty traffic that can benefit from the fixed thresholding scheme. A comparative performance analysis of ADBM yields the following. For pure fixed and pure fuzzy thresholding schemes, the throughputs are at 73.88% and 71.53% while the cell drop rates are at 4.31% and 2.44%,respectively. For the ADBM approach, where the input buffer management alternates at each individual ATM switch between the fixed and fuzzy schemes, governed by measured burstiness index threshold of 25 for a 3,000 timestep interval, the throughput is 74.77%, which is higher than even the pure fixed scheme while the cell drop rate is 2.21% that is lower than that of the pure fuzzy scheme. In essence, ADBM successfully integrates the best characteristics of the fuzzy and fixed thresholding schemes.

  • A New 3-D Display Method Using 3-D Visual Illusion Produced by Overlapping Two Luminance Division Displays

    Hideaki TAKADA  Shiro SUYAMA  Kenji NAKAZAWA  

     
    PAPER-Electronic Displays

      Vol:
    E88-C No:3
      Page(s):
    445-449

    We are developing a simple three-dimensional (3-D) display method that uses only two transparent images using luminance division displays without any extra equipment. This method can be applied to not only electronic displays but also the printed sheets. The method utilizes a 3-D visual illusion in which two ordinary images with many edges can be perceived as an apparent 3-D image with continuous depth between the two image planes, when two identical images are overlapped from the midpoint of the observer's eyes and their optical-density ratio is changed according to the desired image depths. We can use transparent printed sheets or transparent liquid crystal displays to display two overlapping transparent images using this 3-D display method. Subjective test results show that the perceived depths changed continuously as the optical-density ratio changed. Deviations of the perceived depths from the average for each observer were sufficiently small. The depths perceived by all six observers coincided well.

  • Acquisition and Modeling of Driving Skills by Using Three Dimensional Driving Simulator

    Jong-Hae KIM  Yoshimichi MATSUI  Soichiro HAYAKAWA  Tatsuya SUZUKI  Shigeru OKUMA  Nuio TSUCHIDA  

     
    PAPER-Intelligent Transport System

      Vol:
    E88-A No:3
      Page(s):
    770-778

    This paper presents the analysis of the stopping maneuver of the human driver by using a new three-dimensional driving simulator that uses CAVE, which provides stereoscopic immersive vision. First of all, the difference in the driving behavior between 3D and 2D virtual environments is investigated. Secondly, a GMDH is applied to the measured data in order to build a mathematical model of driving behavior. From the obtained model, it is found that the acceleration information has less importance in stopping maneuver under the 2D and 3D environments.

  • Feature Extraction with Combination of HMT-Based Denoising and Weighted Filter Bank Analysis for Robust Speech Recognition

    Sungyun JUNG  Jongmok SON  Keunsung BAE  

     
    LETTER

      Vol:
    E88-D No:3
      Page(s):
    435-438

    In this paper, we propose a new feature extraction method that combines both HMT-based denoising and weighted filter bank analysis for robust speech recognition. The proposed method is made up of two stages in cascade. The first stage is denoising process based on the wavelet domain Hidden Markov Tree model, and the second one is the filter bank analysis with weighting coefficients obtained from the residual noise in the first stage. To evaluate performance of the proposed method, recognition experiments were carried out for additive white Gaussian and pink noise with signal-to-noise ratio from 25 dB to 0 dB. Experiment results demonstrate the superiority of the proposed method to the conventional ones.

  • A Kernel-Based Fisher Discriminant Analysis for Face Detection

    Takio KURITA  Toshiharu TAGUCHI  

     
    PAPER-Pattern Recognition

      Vol:
    E88-D No:3
      Page(s):
    628-635

    This paper presents a modification of kernel-based Fisher discriminant analysis (FDA) to design one-class classifier for face detection. In face detection, it is reasonable to assume "face" images to cluster in certain way, but "non face" images usually do not cluster since different kinds of images are included. It is difficult to model "non face" images as a single distribution in the discriminant space constructed by the usual two-class FDA. Also the dimension of the discriminant space constructed by the usual two-class FDA is bounded by 1. This means that we can not obtain higher dimensional discriminant space. To overcome these drawbacks of the usual two-class FDA, the discriminant criterion of FDA is modified such that the trace of covariance matrix of "face" class is minimized and the sum of squared errors between the average vector of "face" class and feature vectors of "non face" images are maximized. By this modification a higher dimensional discriminant space can be obtained. Experiments are conducted on "face" and "non face" classification using face images gathered from the available face databases and many face images on the Web. The results show that the proposed method can outperform the support vector machine (SVM). A close relationship between the proposed kernel-based FDA and kernel-based Principal Component Analysis (PCA) is also discussed.

  • Subspace-Based Interference Suppression Technique for Long-Code Downlink CDMA Adaptive Receiver

    Samphan PHROMPICHAI  Peerapol YUVAPOOSITANON  Phaophak SIRISUK  

     
    PAPER

      Vol:
    E88-A No:3
      Page(s):
    676-684

    This paper presents a multiple constrained subspace based multiuser detector for synchronous long-code downlink multirate DS-CDMA systems. The novel receiver adapts its fractionally-spaced equaliser tap-weights based upon two modes, namely training and decision-directed modes. Switching between two modes is achieved by changing the code constraint in the associated subspace algorithm. Moreover, detection of the desired user requires the knowledge of the desired user's spreading code only. Simulation results show that the proposed receiver is capable of multiple access interference (MAI) suppression and multipath mitigation. Besides, the results reveal the improvement in terms of convergence speed and mean square error (MSE) of the proposed receiver over the existing receiver in both static and dynamic environments.

  • Fast Macroblock Mode Determination to Reduce H.264 Complexity

    Ki-Hun HAN  Yung-Lyul LEE  

     
    LETTER-Image

      Vol:
    E88-A No:3
      Page(s):
    800-804

    The rate-distortion optimization (RDO) method is an informative technology that improves the coding efficiency, but increases the computational complexity, of the H.264 encoder. In this letter, a fast Macroblock mode determination algorithm is proposed to reduce the computational complexity of the H.264 encoder. The proposed method reduces the encoder complexity by 55%, while maintaining the same level of coding efficiency.

  • SDC: A Scalable Approach to Collect Data in Wireless Sensor Networks

    Niwat THEPVILOJANAPONG  Yoshito TOBE  Kaoru SEZAKI  

     
    PAPER-Software Platform Technologies

      Vol:
    E88-B No:3
      Page(s):
    890-902

    In this paper, we present Scalable Data Collection (SDC) protocol, a tree-based protocol for collecting data over multi-hop, wireless sensor networks. The design of the protocol aims to satisfy the requirements of sensor networks that every sensor transmits sensed data to a sink node periodically or spontaneously. The sink nodes construct the tree by broadcasting a HELLO packet to discover the child nodes. The sensor receiving this packet decides an appropriate parent to which it will attach, it then broadcasts the HELLO packet to discover its child nodes. Based on this process, the tree is quickly created without flooding of any routing packets. SDC avoids periodic updating of routing information but the tree will be reconstructed upon node failures or adding of new nodes. The states required on each sensor are constant and independent of network size, thereby SDC scales better than the existing protocols. Moreover, each sensor can make forwarding decisions regardless of the knowledge on geographical information. We evaluate the performance of SDC by using the ns-2 simulator and comparing with Directed Diffusion, DSR, AODV, and OLSR. The simulation results demonstrate that SDC achieves much higher delivery ratio and lower delay as well as scalability in various scenarios.

  • Hybrid Beamforming Scheme for CDMA Systems to Overcome the Insufficient Pilot Power Problem in Correlated SIMO Channels

    Young-Kwan CHOI  DongKu KIM  

     
    LETTER-Antennas and Propagation

      Vol:
    E88-B No:3
      Page(s):
    1303-1306

    A hybrid beamformer composed of a direction-of-arrival (DOA) based scheme and maximal ratio combining (MRC) is proposed to overcome the degradation caused by inaccurate channel estimation due to insufficient pilot power, which happens in conventional single-input, multiple-output (SIMO) Code Division Multiple Access (CDMA) reverse link. The proposed scheme can provide more accurate channel estimation and interference reduction at the expense of diversity gain in the spatially correlated SIMO channel. As a result, the hybrid scheme outperforms conventional MRC beamformers for six or more antennas in the channel environment, in which the angle-of-spread (AOS) is within 30.

  • Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

    Weifeng LI  Tetsuya SHINDE  Hiroshi FUJIMURA  Chiyomi MIYAJIMA  Takanori NISHINO  Katunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    384-390

    This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.

  • A GFSK Transmitter Architecture for a Bluetooth RF-IC, Featuring a Variable-Loop-Bandwidth Phase-Locked Loop Modulator

    Masaru KOKUBO  Takashi OSHIMA  Katsumi YAMAMOTO  Kunio TAKAYASU  Yoshiyuki EZUMI  Shinya AIZAWA  

     
    PAPER-Microwaves, Millimeter-Waves

      Vol:
    E88-C No:3
      Page(s):
    385-394

    The use of a two-point modulator with variable PLL loop bandwidth as a GFSK signal generator is proposed. Delta-sigma modulation is adopted for the modulator. Through the combination of a variable PLL feedback loop and delta-sigma modulation, both a fast settling time and very clear eye opening are achieved for the modulator. We fabricate it in 0.35-µm BiCMOS process technology. The two-point modulator has a center-frequency drift of only 14.9 kHz, much lower than the 178-kHz result for a single time slot in the case of direct VCO modulation. This is due to the PLL feedback loop. Evaluation also confirmed that the circuit satisfies the various characteristics required of a Bluetooth transmitter. The two-point modulator is also applicable to other transceivers which use FSK or PSK modulation, i.e. forms of modulation where a constant signal level is transmitted, and thus contributes to the simplification of a range of wireless transmitters.

  • Applying Sparse KPCA for Feature Extraction in Speech Recognition

    Amaro LIMA  Heiga ZEN  Yoshihiko NANKAKU  Keiichi TOKUDA  Tadashi KITAMURA  Fernando G. RESENDE  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    401-409

    This paper presents an analysis of the applicability of Sparse Kernel Principal Component Analysis (SKPCA) for feature extraction in speech recognition, as well as, a proposed approach to make the SKPCA technique realizable for a large amount of training data, which is an usual context in speech recognition systems. Although the KPCA (Kernel Principal Component Analysis) has proved to be an efficient technique for being applied to speech recognition, it has the disadvantage of requiring training data reduction, when its amount is excessively large. This data reduction is important to avoid computational unfeasibility and/or an extremely high computational burden related to the feature representation step of the training and the test data evaluations. The standard approach to perform this data reduction is to randomly choose frames from the original data set, which does not necessarily provide a good statistical representation of the original data set. In order to solve this problem a likelihood related re-estimation procedure was applied to the KPCA framework, thus creating the SKPCA, which nevertheless is not realizable for large training databases. The proposed approach consists in clustering the training data and applying to these clusters a SKPCA like data reduction technique generating the reduced data clusters. These reduced data clusters are merged and reduced in a recursive procedure until just one cluster is obtained, making the SKPCA approach realizable for a large amount of training data. The experimental results show the efficiency of SKPCA technique with the proposed approach over the KPCA with the standard sparse solution using randomly chosen frames and the standard feature extraction techniques.

  • Centralized Channel Allocation Technique to Alleviate Exposed Terminal Problem in CSMA/CA-Based Mesh Networks--Solution Employing Chromatic Graph Approach--

    Atsushi FUJIWARA  Yoichi MATSUMOTO  

     
    PAPER-Network

      Vol:
    E88-B No:3
      Page(s):
    958-964

    This paper proposes a channel allocation principle that prevents TCP throughput degradation in multihop transmissions in a mesh network based on the carrier sense multiple access with collision avoidance (CSMA/CA) MAC protocol. We first address the relationship between the network topology of wireless nodes and the TCP throughput degradation based on computer simulations. The channel allocation principle is discussed in terms of resolution into a coloring problem based on throughput degradation. The number of required channels for the proposed channel allocation principle is evaluated and it is shown that two channels are sufficient for more than 96% simulated multihop patterns. The proposed channel allocation principle is extendable to generic mesh networks. We also clarify the number of required channels for mesh networks. The simulation results show that three channels are sufficient for more than 98% patterns in the generic mesh networks when the number of nodes is less than 10.

  • Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification

    Hiroyoshi YAMAMOTO  Yoshihiko NANKAKU  Chiyomi MIYAJIMA  Keiichi TOKUDA  Tadashi KITAMURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    418-424

    This paper investigates the parameter tying structures of a mixture of factor analyzers (MFA) and discriminative training of MFA for speaker identification. The parameters of factor loading matrices or diagonal matrices are shared in different mixtures of MFA. Then, minimum classification error (MCE) training is applied to the MFA parameters to enhance the discrimination ability. The result of a text-independent speaker identification experiment shows that MFA outperforms the conventional Gaussian mixture model (GMM) with diagonal or full covariance matrices and achieves the best performance when sharing the diagonal matrices, resulting in a relative gain of 26% over the GMM with diagonal covariance matrices. The improvement is more significant especially in sparse training data condition. The recognition performance is further improved by MCE training with an additional gain of 3% error reduction.

  • A VoiceFont Creation Framework for Generating Personalized Voices

    Takashi SAITO  Masaharu SAKAMOTO  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    525-534

    This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.

9981-10000hit(16314hit)