The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] TE(21534hit)

12061-12080hit(21534hit)

  • Exploiting Intelligence in Fighting Action Games Using Neural Networks

    Byeong Heon CHO  Sung Hoon JUNG  Yeong Rak SEONG  Ha Ryoung OH  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E89-D No:3
      Page(s):
    1249-1256

    This paper proposes novel methods to provide intelligence for characters in fighting action games by using neural networks. First, how a character learns basic game rules and matches against randomly acting opponents is considered. Since each action takes more than one time unit in general fighting action games, the results of a character's action are exposed not immediately but several time units later. We evaluate the fitness of a decision by using the relative score change caused by the decision. Whenever the scores of fighting characters are changed, the decision causing the score change is identified, and then the neural network is trained by using the score difference and the previous input and output values which induced the decision. Second, how to cope more properly with opponents that act with predefined action patterns is addressed. The opponents' past actions are utilized to find out the optimal counter-actions for the patterns. Lastly, a method in order to learn moving actions is proposed. To evaluate the performance of the proposed algorithm, we implement a simple fighting action game. Then the proposed intelligent character (IC) fights with the opponent characters (OCs) which act randomly or with predefined action patterns. The results show that the IC understands the game rules and finds out the optimal counter-actions for the opponents' action patterns by itself.

  • A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

    Masakiyo FUJIMOTO  Satoshi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    922-930

    This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech recognition in noise. In the proposed method, a noise sequence is estimated in three stages: a sequential importance sampling step, a residual resampling step, and finally a Markov chain Monte Carlo step with Metropolis-Hastings sampling. The estimated noise sequence is used in the MMSE-based clean speech estimation. We also introduce Polyak averaging and feedback into a state transition process for particle filtering. In the evaluation results, we observed that the proposed method improves speech recognition accuracy in the results of non-stationary noise environments a noise compensation method with stationary noise assumptions.

  • Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework

    Sakriani SAKTI  Satoshi NAKAMURA  Konstantin MARKOV  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    946-953

    Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined based on the Bayesian framework to achieve better inference and to better account for modeling uncertainty. The approach we adopted here is to utilize the benefits of the Bayesian framework to improve acoustic model precision in speech recognition systems, which modeling a wider-than-triphone context by approximating it using several less context-dependent models. Such a composition was developed in order to avoid the crucial problem of limited training data and to reduce the model complexity. To enhance the model reliability due to unseen contexts and limited training data, flooring and smoothing techniques are applied. Experimental results show that the proposed Bayesian pentaphone model improves word accuracy in comparison with the standard triphone model.

  • Training Augmented Models Using SVMs

    Mark J.F. GALES  Martin I. LAYTON  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    892-899

    There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here, a local exponential approximation is made about some point on a base model. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task.

  • Production-Oriented Models for Speech Recognition

    Erik MCDERMOTT  Atsushi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    1006-1014

    Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation thereof; state-to-state evolution is modeled only crudely, with no explicit relationship between states, such as would be afforded by the use of phonetic features commonly used by linguists to describe speech phenomena, or by the continuity and smoothness of the production parameters governing speech. This survey article attempts to provide an overview of proposals by several researchers for improving acoustic modeling in these regards. Such topics as the controversial Motor Theory of Speech Perception, work by Hogden explicitly using a continuity constraint in a pseudo-articulatory domain, the Kalman filter based Hidden Dynamic Model, and work by many groups showing the benefits of using articulatory features instead of phones as the underlying units of speech, will be covered.

  • Single-Channel Multiple Regression for In-Car Speech Enhancement

    Weifeng LI  Katsunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1032-1039

    We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).

  • ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

    Shigeki MATSUDA  Takatoshi JITSUHIRO  Konstantin MARKOV  Satoshi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    989-997

    In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single acoustic model. One solution is to employ multiple acoustic models, one model for each different condition. Even though the robustness of each acoustic model is limited, the whole ASR system can handle various conditions appropriately. In our system, there are two recognition sub-systems which use different features such as MFCC and Differential MFCC (DMFCC). Each sub-system has several acoustic models depending on SNR, speaker gender and speaking style, and during recognition each acoustic model is adapted by fast noise adaptation. From each sub-system, one hypothesis is selected based on posterior probability. The final recognition result is obtained by combining the best hypotheses from the two sub-systems. On the AURORA-2J task used widely for the evaluation of noise robustness, our system achieved higher recognition performance than a system which contains only a single model. Also, our system was tested using normal and hyper-articulated speech contaminated by several background noises, and exhibited high robustness to noise and speaking styles.

  • Prototype Implementation of Real-Time ML Detectors for Spatial Multiplexing Transmission

    Toshiaki KOIKE  Yukinaga SEKI  Hidekazu MURATA  Susumu YOSHIDA  Kiyomichi ARAKI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    845-852

    We developed two types of practical maximum-likelihood detectors (MLD) for multiple-input multiple-output (MIMO) systems, using a field programmable gate array (FPGA) device. For implementations, we introduced two simplified metrics called a Manhattan metric and a correlation metric. Using the Manhattan metric, the detector needs no multiplication operations, at the cost of a slight performance degradation within 1 dB. Using the correlation metric, the MIMO-MLD can significantly reduce the complexity in both multiplications and additions without any performance degradation. This paper demonstrates the bit-error-rate performance of these MLD prototypes at a 1 Gbps-order real-time processing speed, through the use of an all-digital baseband 44 MIMO testbed integrated on the same FPGA chip.

  • Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

    Ian R. LANE  Tatsuya KAWAHARA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    931-938

    Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.

  • A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency

    Sakriani SAKTI  Konstantin MARKOV  Satoshi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    954-961

    The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from the data sparsity problem and memory constraints. Therefore, an efficient modeling of wider contexts needs to be addressed to achieve a realistic application for an automatic speech recognition system. This paper presents a new method of modeling pentaphone-context units using the hybrid HMM/BN acoustic modeling framework. Rather than modeling pentaphones explicitly, in this approach the probabilistic dependencies between the triphone context unit and the second preceding/following contexts are incorporated into the triphone state output distributions by means of the BN. The advantages of this approach are that we are able to extend the modeled phonetic context within the triphone framework, and we can use a standard decoding system by assuming the next preceding/following context variables hidden during the recognition. To handle the increased parameter number, tying using knowledge-based phoneme classes and a data-driven clustering method is applied. The evaluation experiments indicate that the proposed model outperforms the standard HMM based triphone model, achieving a 9-10% relative word error rate (WER) reduction.

  • A Development of Circuit Emulation System on TDM over Ethernet Comprising OAM and Protection Function

    Akihiko TANAKA  Atsushi IWAMURA  Masahiko MIZUTANI  Yoshihiro ASHI  

     
    PAPER

      Vol:
    E89-B No:3
      Page(s):
    668-674

    The Ethernet network is widely used and adopted to the access portion or metro area for the reason of new applications for native Ethernet services or its economical advantage. Apart from these applications for native Ethernet, an encapsulation technology to transport legacy services over Ethernet, i.e. TDM over Ethernet, is focused on. In order to apply it to the carrier networks, it is necessary to meet Quality of Service (QoS) requirements, and the consideration of operation, administration and maintenance (OAM) aspects are indispensable. Furthermore, in order for higher reliability, it is required to apply protection function to the networks. We have studied the encapsulation method of TDM signals applied to circuit emulator accommodating TDM signals over Ethernet. In addition, the OAM mechanism and the protection function are studied. This paper shows the frame format, the detail of the OAM mechanism and the protection function, and introduces a developed circuit for adaptation of TDM over Ethernet.

  • An Attack on the Identity-Based Key Agreement Protocols in Multiple PKG Environment

    JoongHyo OH  SangJae MOON  Jianfeng MA  

     
    LETTER-Information Security

      Vol:
    E89-A No:3
      Page(s):
    826-829

    Lee et al. recently proposed the first identity-based key agreement protocols for a multiple PKG environment where each PKG has different domain parameters in ICCSA 2005. However, this letter demonstrates that Lee et al.'s scheme does not include the property of implicit key authentication which is the fundamental security requirement, making it vulnerable to an impersonation attack.

  • Design of Equiripple Minimum Phase FIR Filters with Ripple Ratio Control

    Masahiro OKUDA  Masaaki IKEHARA  Shin-ichi TAKAHASHI  

     
    PAPER-Digital Signal Processing

      Vol:
    E89-A No:3
      Page(s):
    751-756

    In this paper, we present a numerical method for the equiripple approximation of minimum phase FIR digital filters. Many methods have been proposed for the design of such filters. Many of them first design a linear phase filter whose length is twice as long, and then factorize the filter to obtain the minimum phase. Although these methods theoretically guarantee its optimality, it is difficult to control the ratio of ripples between different bands. In the conventional lowpass filter design, for example, when different weights are given for its passband and stopband, one needs to iteratively design the filter by trial and error to achieve the ratio of the weights exactly. To address this problem, we modifies well-known Parks-McClellan algorithm and make it possible to directly control the ripple ratios. The method iteratively solves a set of linear equations with controlling the ratio of ripples. Using this method, the equiripple solutions are obtained quickly.

  • Multi-Ported Register File for Reducing the Impact of PVT Variation

    Yuuichirou IKEDA  Masaya SUMITA  Makoto NAGATA  

     
    PAPER-Signal Integrity and Variability

      Vol:
    E89-C No:3
      Page(s):
    356-363

    We have developed a 32-bit, 32-word, and 9-read, 7-write ported register file. This register file has several circuits and techniques for reducing the impact of process variation that is marked in recent process technologies, voltage variation, and temperature variation, so called PVT variation. We describe these circuits and techniques in detail, and confirm their effects by simulation and measurement of the test chip.

  • Aperture-Backed Microstrip-Line Stepped-Impedance Resonators and Transformers for Performance-Enhanced Bandpass Filters

    Hang WANG  Lei ZHU  

     
    PAPER-Microwaves, Millimeter-Waves

      Vol:
    E89-C No:3
      Page(s):
    403-409

    A novel class of microstrip bandpass filter is configured using the impedance transformers and an improved stepped impedance resonator (SIR). This SIR is composed of a central narrow strip section with an aperture on ground and two wide strip sections at the two sides. This low-high-low SIR resonator has a promising capability in achieving an extremely large ratio of first two resonant frequencies for design of a bandpass filter with ultra-broad stopband. The two quarter-wavelength transformers with low and high impedances, referred as to impedance- and admittance-inverters, are modeled and utilized as alternative types of inductive and capacitive coupling elements with highly tightened degrees for wideband filter design. After extensive investigation is made on the two transformers and the proposed SIR, the two novel bandpass filters are constructed, designed and implemented. Two sets of predicted and measured frequency responses over a wide frequency range both quantitatively exhibit their several attractive features, such as ultra-broad stopband with deep rejection and broadened dominant passband with low insertion loss.

  • Feedforward Active Substrate Noise Cancelling Based on di/dt of Power Supply

    Toru NAKURA  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Signal Integrity and Variability

      Vol:
    E89-C No:3
      Page(s):
    364-369

    This paper demonstrates a feedforward active substrate noise cancelling technique using a power supply di/dt detector. Since the substrate is usually tied with the ground line with a low impedance, the substrate noise is closely related to the ground bounce which is proportional to the di/dt when inductance is dominant on the ground line impedance. Our active cancelling detects the di/dt of the power supply, and injects an anti-phase current into the substrate so that the di/dt-proportional substrate noise is cancelled out. Our first trial shows that 34% substrate noise reduction is achieved on our test circuit, and the theoretical analysis shows that the optimized canceller design will enhance the substrate noise suppression ratio up to 56%.

  • Reducing Consuming Clock Power Optimization of a 90 nm Embedded Processor Core

    Tetsuya YAMADA  Masahide ABE  Yusuke NITTA  Kenji OGURA  Manabu KUSAOKE  Makoto ISHIKAWA  Motokazu OZAWA  Kiwamu TAKADA  Fumio ARAKAWA  Osamu NISHII  Toshihiro HATTORI  

     
    PAPER-Low Power Techniques

      Vol:
    E89-C No:3
      Page(s):
    287-294

    A low-power SuperHTM embedded processor core, the SH-X2, has been designed in 90-nm CMOS technology. The power consumption was reduced by using hierarchical fine-grained clock gating to reduce the power consumption of the flip-flops and the clock-tree, synthesis and a layout that supports the implementation of the clock gating, and several-level power evaluations for RTL refinement. With this clock gating and RTL refinement, the power consumption of the clock-tree and flip-flops was reduced by 35% and 59%, including the process shrinking effects, respectively. As a result, the SH-X2 achieved 6,000 MIPS/W using a Renesas low-power process with a lowered voltage. Its performance-power efficiency was 25% better than that of a 130-nm-process SH-X.

  • Two Schemes for an Overloaded Space-Time Spreading System over a Flat Rayleigh Fading MIMO Channel

    Dianjun CHEN  Takeshi HASHIMOTO  

     
    PAPER-Spread Spectrum Technologies and Applications

      Vol:
    E89-A No:3
      Page(s):
    798-806

    We propose two sequence design schemes for an overloaded space-time spreading system with multiple antennas. One scheme is for a system in which the amplitude of user signals needs not be adjusted and provides tradeoffs between the user capacity and diversity order. This scheme has a certain similarity to time-sharing, but its performance is further improved by time-diversity. Another is to achieve full diversity order by varying user signal amplitudes. The diversity orders of the respective schemes are theoretically proved and their performances are demonstrated by simulation.

  • A Robust Detector for Rapid Code Acquisition in Non-Gaussian Impulsive Channels

    Seokho YOON  Suk Chan KIM  Sun Yong KIM  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    809-815

    Recently, a novel detector was proposed by the authors for code acquisition in non-Gaussian impulsive channels [3], which dramatically outperforms the conventional squared-sum detector; however, it requires exact knowledge of the non-Gaussian noise dispersion. In this paper, a robust detector is proposed, which employs the signs and ranks of the received signal samples, instead of their actual values, and so does not require knowledge of the non-Gaussian noise dispersion. The acquisition performance of the proposed detector is compared with that of the detector of [3] in terms of the mean acquisition time. The simulation results show that the proposed scheme is not only robust to deviations from the true value of the non-Gaussian noise dispersion, but also has comparable performance to that of the scheme of [3] using exact knowledge of the non-Gaussian noise dispersion.

  • Visible Light Communication with LED Traffic Lights Using 2-Dimensional Image Sensor

    Haswani BINTI CHE WOOK  Shinichiro HARUYAMA  Masao NAKAGAWA  

     
    PAPER-Communications

      Vol:
    E89-A No:3
      Page(s):
    654-659

    We propose a new receiving method for an information-providing system that uses LED-based traffic lights as the transmitter. We analyzed the improvements obtained when 2-dimentional image sensor replaced the conventional single-element photodiode. First, we discuss the maximum receiver's field of view (FOV) when using the 2-dimentional image sensor at a particular focal length. We analyzed the best vertical inclination for both lanes and quantified the improvements in terms of the enhancement of received signal-noise ratio (SNR) when different numbers of pixels were applied. Our results indicate that using more pixels increases the received SNR and the service area becomes wider compared to the conventional single-element system. Consequently, receivable information within the service area also increased. We also found that the optimum number of pixels to accomplish a reliable communication system is 5050 because performance degradation occured with a larger number of pixels.

12061-12080hit(21534hit)