The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SC(4570hit)

2521-2540hit(4570hit)

  • Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

    Ian R. LANE  Tatsuya KAWAHARA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    931-938

    Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.

  • Single-Channel Multiple Regression for In-Car Speech Enhancement

    Weifeng LI  Katsunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1032-1039

    We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).

  • System LSI: Challenges and Opportunities

    Tadahiro KURODA  

     
    INVITED PAPER

      Vol:
    E89-C No:3
      Page(s):
    213-220

    Scaling of CMOS Integrated Circuit is becoming difficult, due mainly to rapid increase in power dissipation. How will the semiconductor technology and industry develop? This paper discusses challenges and opportunities in system LSI from three levels of perspectives: transistor level (physics), IC level (electronics), and business level (economics).

  • Hybrid SC/MRRC Technique for OFDM Systems

    Won Gi JEON  Hyeok Koo JUNG  

     
    LETTER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    1003-1006

    In this letter, a hybrid selection combining (SC) and maximal ratio receive combining (MRRC) technique is proposed for orthogonal frequency-division multiplexing (OFDM) systems with multiple receive antennas. The proposed technique still uses multiple receive antennas, but it has just a single RF front-end and a single baseband demodulator. In comparison with the OFDM system with no diversity, we can achieve superior gain irrespective of bandwidth efficiency, and also in comparison with the MRRC OFDM, we can achieve better gain under the bandwidth efficiency of 3 bps/Hz at the bit error rate of 10-6.

  • Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

    William BYRNE  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    900-907

    Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going 'beyond HMMs', showing in particular that this process of subproblem identification makes it possible to train and apply small-domain binary pattern classifiers, such as Support Vector Machines, to large vocabulary continuous speech recognition.

  • A Priority-Based Packet Scheduling Architecture for Integrated Services Networks

    Junni ZOU  Hongkai XIONG  Rujian LIN  

     
    LETTER

      Vol:
    E89-B No:3
      Page(s):
    704-708

    To simultaneously support guaranteed real-time services and best-effort service, a Priority-based Scheduling Architecture (PSA) designed for high-speed switches is proposed. PSA divides packet scheduling into high-priority phase and low-priority phase. In the high-priority phase, an improved sorted-priority algorithm is presented. It introduces a new constraint into the scheduling discipline to overcome bandwidth preemption. Meanwhile, the virtual time function with a control factor α is employed. Both computer simulation results and theoretic analysis show that the PSA mechanism has excellent performance in terms of the implementation complexity, fairness and delay properties.

  • Grayscale Image Segmentation Using Color Space

    Takahiko HORIUCHI  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E89-D No:3
      Page(s):
    1231-1237

    A novel approach for segmentation of grayscale images, which are color scene originally, is proposed. Many algorithms have been elaborated for a grayscale image segmentation. All those approaches have been discussed in a luminance space, because it has been considered that grayscale images do not have any color information. However, a luminance value has color information as a set of corresponding colors. In this paper, an inverse mapping of luminance values is carried out to CIELAB color space, and the image segmentation for grayscale images is performed based on a distance in the color space. The proposed scheme is applied to a region growing segmentation and the performance is verified.

  • Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework

    Sakriani SAKTI  Satoshi NAKAMURA  Konstantin MARKOV  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    946-953

    Over the last decade, the Bayesian approach has increased in popularity in many application areas. It uses a probabilistic framework which encodes our beliefs or actions in situations of uncertainty. Information from several models can also be combined based on the Bayesian framework to achieve better inference and to better account for modeling uncertainty. The approach we adopted here is to utilize the benefits of the Bayesian framework to improve acoustic model precision in speech recognition systems, which modeling a wider-than-triphone context by approximating it using several less context-dependent models. Such a composition was developed in order to avoid the crucial problem of limited training data and to reduce the model complexity. To enhance the model reliability due to unseen contexts and limited training data, flooring and smoothing techniques are applied. Experimental results show that the proposed Bayesian pentaphone model improves word accuracy in comparison with the standard triphone model.

  • High Speed 3D IR Scanner for Home Service Robots

    Jehyuk RYU  Sungho YUN  Kyungjin SONG  Jundong CHO  Jongmoo CHOI  Sukhan LEE  

     
    PAPER-Image/Vision Processing

      Vol:
    E89-A No:3
      Page(s):
    678-685

    This paper introduces the hardware platform of the structured light processing based on depth imaging to perform a 3D modeling of cluttered workspace for home service robots. We have discovered that the degradation of precision and robustness comes mainly from the overlapping of multiple codes in the signal received at a camera pixel. Considering the criticality of separating the overlapped codes to precision and robustness, we proposed a novel signal separation code, referred to here as "Hierarchically Orthogonal Code (HOC)," for depth imaging. The proposed HOC algorithm was implemented by using hardware platform which applies the Xilinx XC2V6000 FPGA to perform a real time 3D modeling and the invisible IR (Infrared) pattern lights to eliminate any inconveniences for the home environment. The experimental results have shown that the proposed HOC algorithm significantly enhances the robustness and precision in depth imaging, compared to the best known conventional approaches. Furthermore, after we processed the HOC algorithm implemented on our hardware platform, the results showed that it required 34 ms of time to generate one 3D image. This processing time is about 24 times faster than the same implementation of HOC algorithm using software, and the real-time processing is realized.

  • Module-Wise Dynamic Voltage and Frequency Scaling for a 90 nm H.264/MPEG-4 Codec LSI

    Yukihito OOWAKI  Shinichiro SHIRATAKE  Toshihide FUJIYOSHI  Mototsugu HAMADA  Fumitoshi HATORI  Masami MURAKATA  Masafumi TAKAHASHI  

     
    INVITED PAPER

      Vol:
    E89-C No:3
      Page(s):
    263-270

    The module-wise dynamic voltage and frequency scaling (MDVFS) scheme is applied to a single-chip H.264/MPEG-4 audio/visual codec LSI. The power consumption of the target module with controlled supply voltage and frequency is reduced by 40% in comparison with the operation without voltage or frequency scaling. The consumed power of the chip is 63 mW in decoding QVGA H.264 video at 15 fps and MPEG-4 AAC LC audio simultaneously. This LSI keep operating continuously even during the voltage transition of the target module by introducing the newly developed dynamic de-skewing system (DDS) which watches and control the clock edge of the target module.

  • A Simple Method for Detecting Tumor in T2-Weighted MRI Brain Images: An Image-Based Analysis

    Phooi-Yee LAU  Shinji OZAWA  

     
    PAPER-Biological Engineering

      Vol:
    E89-D No:3
      Page(s):
    1270-1279

    The objective of this paper is to present a decision support system which uses a computer-based procedure to detect tumor blocks or lesions in digitized medical images. The authors developed a simple method with a low computation effort to detect tumors on T2-weighted Magnetic Resonance Imaging (MRI) brain images, focusing on the connection between the spatial pixel value and tumor properties from four different perspectives: 1) cases having minuscule differences between two images using a fixed block-based method, 2) tumor shape and size using the edge and binary images, 3) tumor properties based on texture values using spatial pixel intensity distribution controlled by a global discriminate value, and 4) the occurrence of content-specific tumor pixel for threshold images. Measurements of the following medical datasets were performed: 1) different time interval images, and 2) different brain disease images on single and multiple slice images. Experimental results have revealed that our proposed technique incurred an overall error smaller than those in other proposed methods. In particular, the proposed method allowed decrements of false alarm and missed alarm errors, which demonstrate the effectiveness of our proposed technique. In this paper, we also present a prototype system, known as PCB, to evaluate the performance of the proposed methods by actual experiments, comparing the detection accuracy and system performance.

  • Analysis of Large-Scale Periodic Array Antennas by CG-FFT Combined with Equivalent Sub-Array Preconditioner

    Huiqing ZHAI  Qiang CHEN  Qiaowei YUAN  Kunio SAWAYA  Changhong LIANG  

     
    PAPER-Antennas and Propagation

      Vol:
    E89-B No:3
      Page(s):
    922-928

    This paper presents method that offers the fast and accurate analysis of large-scale periodic array antennas by conjugate-gradient fast Fourier transform (CG-FFT) combined with an equivalent sub-array preconditioner. Method of moments (MoM) is used to discretize the electric field integral equation (EFIE) and form the impedance matrix equation. By properly dividing a large array into equivalent sub-blocks level by level, the impedance matrix becomes a structure of Three-level Block Toeplitz Matrices. The Three-level Block Toeplitz Matrices are further transformed to Circulant Matrix, whose multiplication with a vector can be rapidly implemented by one-dimension (1-D) fast Fourier transform (FFT). Thus, the conjugate-gradient fast Fourier transform (CG-FFT) is successfully applied to the analysis of a large-scale periodic dipole array by speeding up the matrix-vector multiplication in the iterative solver. Furthermore, an equivalent sub-array preconditioner is proposed to combine with the CG-FFT analysis to reduce iterative steps and the whole CPU-time of the iteration. Some numerical results are given to illustrate the high efficiency and accuracy of the present method.

  • Depth Perception from a 2D Natural Scene Using Scale Variation of Texture Patterns

    Yousun KANG  Hiroshi NAGAHASHI  

     
    LETTER-Pattern Recognition

      Vol:
    E89-D No:3
      Page(s):
    1294-1298

    In this paper, we introduce a new method for depth perception from a 2D natural scene using scale variation of patterns. As the surface from a 2D scene gets farther away from us, the texture appears finer and smoother. Texture gradient is one of the monocular depth cues which can be represented by gradual scale variations of textured patterns. To extract feature vectors from textured patterns, higher order local autocorrelation functions are utilized at each scale step. The hierarchical linear discriminant analysis is employed to classify the scale rate of the feature vector which can be divided into subspaces by recursively grouping the overlapped classes. In the experiment, relative depth perception of 2D natural scenes is performed on the proposed method and it is expected to play an important role in natural scene analysis.

  • Robust Speech Recognition by Using Compensated Acoustic Scores

    Shoei SATO  Kazuo ONOE  Akio KOBAYASHI  Toru IMAI  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    915-921

    This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.

  • Analysis of Reactance Oscillators Having Multi-Mode Oscillations

    Yoshihiro YAMAGAMI  Yoshifumi NISHIO  Akio USHIDA  

     
    PAPER-Circuit Theory

      Vol:
    E89-A No:3
      Page(s):
    764-771

    We consider oscillators consisting of a reactance circuit and a negative resistor. They may happen to have multi-mode oscillations around the anti-resonant frequencies of the reactance circuit. This kind of oscillators can be easily synthesized by setting the resonant and anti-resonant frequencies of the reactance circuits. However, it is not easy to analyze the oscillation phenomena, because they have multiple oscillations whose oscillations depend on the initial guesses. In this paper, we propose a Spice-oriented solution algorithm combining the harmonic balance method with Newton homotopy method that can find out the multiple solutions on the homotopy paths. In our analysis, the determining equations from the harmonic balance method are given by modified equivalent circuit models of "DC," "Cosine" and "Sine" circuits. The modified circuits can be solved by a simulator STC (solution curve tracing circuit), where the multiple oscillations are found by the transient analysis of Spice. Thus, we need not to derive the troublesome circuit equations, nor the mathematical transformations to get the determining equations. It makes the solution algorithms much simpler.

  • Wireless QoS for High-Speed CDMA Packet Cellular Systems--With Radio-Condition-Aware Admission Control and Resource Allocation Reflected Multistage Scheduling--

    Narumi UMEDA  Lan CHEN  Hidetoshi KAYAMA  

     
    PAPER-Terrestrial Radio Communications

      Vol:
    E89-B No:3
      Page(s):
    886-894

    Supporting diversified rates for real-time communications will become possible and essential with the rapidly increasing transmission rates provided by the 4th generation (4G) mobile communication systems. In this paper, a novel wireless Quality of Service (QoS) scheme suitable for broadband CDMA packet cellular systems with adaptive modulation coding is proposed and its characteristics are described. The proposed QoS scheme comprises several control factors laid on the MAC and RRC layers, and can be harmonized with IP-QoS. Two important control factors are proposed: radio-condition-aware admission control and resource allocation reflected multistage scheduling. Computer simulations and testbed experiments indicate that by using the radio-condition-aware admission control, stable and guaranteed service can be provided to real-time users regardless of the interference and the variation in the location of the mobile station. Moreover, resource allocation reflected multistage scheduling maintains guaranteed rates for real-time users and provides high resource utilization efficiency for best-effort users. Consequently, by using the proposed wireless QoS scheme, it is possible to provide users with high quality and diversified real-time services, on a packet based radio network for enhanced 3G and beyond.

  • Wrinkle Rendering of Terrain Models in Chinese Landscape Painting

    Der-Lor WAY  Zen-Chung SHIH  

     
    PAPER-Computer Graphics

      Vol:
    E89-D No:3
      Page(s):
    1238-1248

    Landscapes have been the main theme in Chinese painting for over one thousand years. Chinese ink painting is a form of non-photorealistic rendering. Terrain is the major subject in Chinese landscape painting, and surface wrinkles are important in conveying the orientation of mountains and contributing to the atmosphere. Over the centuries, masters of Chinese landscape painting have developed various kinds of wrinkles. This work develops a set of novel methods for rendering wrinkles in Chinese landscape painting. A three-dimensional terrain is drawn as an outline and wrinkles, using information on the shape, shade and orientation of the terrain's polygonal surface. The major contribution of this work lies in the modeling and implementation of six major types of wrinkles on the surface of terrain, using traditional Chinese brush techniques. Users can select a style of wrinkle and input parameters to control the desired effect. The proposed method then completes the painting process automatically.

  • Teeth Image Recognition for Biometrics

    Tae-Woo KIM  Tae-Kyung CHO  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E89-D No:3
      Page(s):
    1309-1313

    This paper presents a personal identification method based on BMME and LDA for images acquired at anterior and posterior occlusion expression of teeth. The method consists of teeth region extraction, BMME, and pattern recognition for the images acquired at the anterior and posterior occlusion state of teeth. Two occlusions can provide consistent teeth appearance in images and BMME can reduce matching error in pattern recognition. Using teeth images can be beneficial in recognition because teeth, rigid objects, cannot be deformed at the moment of image acquisition. In the experiments, the algorithm was successful in teeth recognition for personal identification for 20 people, which encouraged our method to be able to contribute to multi-modal authentication systems.

  • Efficient Motion Vector Composition Algorithm by Activity Measurement for Downscaled Video Transcoder

    Ching-Ting HSU  Mei-Juan CHEN  

     
    LETTER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E89-B No:3
      Page(s):
    1036-1039

    When the frame size is downscaled for video transcoding, the new motion vector (MV) must be computed. This paper presents an algorithm to utilize the activity measurement by DC value and the number of non-zero quantized DCT coefficients in the residual macroblock to compose the motion vector. It can reduce the complexity for motion estimation and improve the performance of the spatial domain video transcoder.

  • Frequency Domain Multiplexing of TES Signals by Magnetic Field Summation

    Noriko Y. YAMASAKI  Yoh TAKEI  Kensuke MASUI  Kazuhisa MITSUDA  Toshimitsu MOROOKA  Satoshi NAKAYAMA  

     
    INVITED PAPER

      Vol:
    E89-C No:2
      Page(s):
    98-105

    In frequency-domain multiplexing (FDM) for TES signals, a magnetic field summation method utilizing a multi-input SQUID has the fundamental merit of small degradation of the signal-to-noise ratio. We formulated shifts of the operation point due to a common impedance and cross talk currents. These effects are evaluated for several FDM methods, and the requirements for the bandwidth and filters are summarized. The design parameters of multi-input SQUIDs and a flux locked loop driving circuits are also presented.

2521-2540hit(4570hit)