The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4073hit)

2501-2520hit(4073hit)

  • Substring Count Estimation in Extremely Long Strings

    Jinuk BAE  Sukho LEE  

     
    PAPER-Database

      Vol:
    E89-D No:3
      Page(s):
    1148-1156

    To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.

  • Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

    William BYRNE  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    900-907

    Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going 'beyond HMMs', showing in particular that this process of subproblem identification makes it possible to train and apply small-domain binary pattern classifiers, such as Support Vector Machines, to large vocabulary continuous speech recognition.

  • Circuits for CMOS High-Speed I/O in Sub-100 nm Technologies

    Hirotaka TAMURA  Masaya KIBUNE  Hisakatsu YAMAGUCHI  Kouichi KANDA  Kohtaroh GOTOH  Hideki ISHIDA  Junji OGAWA  

     
    INVITED PAPER

      Vol:
    E89-C No:3
      Page(s):
    300-313

    The paper provides an overview of the circuit techniques for CMOS high-speed I/Os, focusing on the design issues in sub-100 nm standard CMOS. First, we describe the evolution of CMOS high-speed I/O since it appeared in mid 90's. In our view, the surge in the I/O bandwidth we experienced from the mid 90's to the present was driven by the continuous improvement of the CMOS IC performance. As a result, CMOS high-speed I/O has covered the data rate ranging from 2.5 Gb/s to 10 Gb/s, and now is heading for 40 Gb/s and beyond. To meet the speed requirements, an optimum choice of the transceiver architecture and its building blocks are crucial. We pick the most critical building blocks such as the decision circuit and the multiplexors and give detailed explanation of their designs. We describe the low-voltage operation of the high-speed I/O in view of reducing the power consumption. An example of a 90-nm CMOS 2.5 Gb/s transceiver operating off a 0.8 V power supply will be described. Operability at 0.8 V ensures that the circuits will not become obsolescent, even below the 60 nm process node.

  • Training Augmented Models Using SVMs

    Mark J.F. GALES  Martin I. LAYTON  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    892-899

    There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here, a local exponential approximation is made about some point on a base model. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task.

  • A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

    Masakiyo FUJIMOTO  Satoshi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    922-930

    This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech recognition in noise. In the proposed method, a noise sequence is estimated in three stages: a sequential importance sampling step, a residual resampling step, and finally a Markov chain Monte Carlo step with Metropolis-Hastings sampling. The estimated noise sequence is used in the MMSE-based clean speech estimation. We also introduce Polyak averaging and feedback into a state transition process for particle filtering. In the evaluation results, we observed that the proposed method improves speech recognition accuracy in the results of non-stationary noise environments a noise compensation method with stationary noise assumptions.

  • Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

    Tetsuji OGAWA  Tetsunori KOBAYASHI  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    939-945

    A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM, we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs from category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.

  • A Fast Fractal Image Compression Algorithm Based on Average-Variance Function

    ChenGuang ZHOU  Kui MENG  ZuLian QIU  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E89-D No:3
      Page(s):
    1303-1308

    In order to improve the efficiency and speed of match seeking in fractal compression, this paper presents an Average-Variance function which can make the optimal choice more efficiently. Based on it, we also present a fast optimal choice fractal image compression algorithm and an optimal method of constructing data tree which greatly improve the performances of the algorithm. Analysis and experimental results proved that it can improve PSNR over 1 dB and improve the coding speed over 30-40% than ordinary optimal choice algorithms such as algorithm based on center of gravity and algorithm based on variance. It can offer much higher optimal choice efficiency, higher reconstructive quality and rapid speed. It's a fast fractal encoding algorithm with high performances.

  • Channel-Count-Independent BIST for Multi-Channel SerDes

    Kouichi YAMAGUCHI  Muneo FUKAISHI  

     
    PAPER-Interface and Interconnect Techniques

      Vol:
    E89-C No:3
      Page(s):
    314-319

    This paper describes a BIST circuit for testing SoC integrated multi-channel serializer/deserializer (SerDes) macros. A newly developed packet-based PRBS generator enables the BIST to perform at-speed testing of asynchronous data transfers. In addition, a new technique for chained alignment checks between adjacent channels helps achieve a channel-count-independent architecture for verification of multi-channel alignment between SerDes macros. Fabricated in a 0.13-µm CMOS process and operating at > 500 MHz, the BIST has successfully verified all SerDes functions in at-speed testing of 5-Gbps20-ch SerDes macros.

  • Quantum Noise and Feed-Back Noise in Blue-Violet InGaN Semiconductor Lasers

    Kenjiro MATSUOKA  Kazushi SAEKI  Eiji TERAOKA  Minoru YAMADA  Yuji KUWAMURA  

     
    LETTER-Lasers, Quantum Electronics

      Vol:
    E89-C No:3
      Page(s):
    437-439

    Properties of the quantum noise and the optical feedback noise in blue-violet InGaN semiconductor lasers were measured in detail. We confirmed that the quantum noise in the blue-violet laser becomes higher than that in the near-infrared laser. This property is an intrinsic property basing on principle of the quantum mechanics, and is severe subject to apply the laser for optical disk with the small consuming power. The feedback noise was classified into two types of "low frequency type" and "flat type" basing on frequency spectrum of the noise. This classification was the same as that in the near infra-red lasers.

  • Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework

    Shinji WATANABE  Atsushi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    970-980

    We introduce a robust classification method based on the Bayesian predictive distribution (Bayesian Predictive Classification, referred to as BPC) for speech recognition. We and others have recently proposed a total Bayesian framework named Variational Bayesian Estimation and Clustering for speech recognition (VBEC). VBEC includes the practical computation of approximate posterior distributions that are essential for BPC, based on variational Bayes (VB). BPC using VB posterior distributions (VB-BPC) provides an analytical solution for the predictive distribution as the Student's t-distribution, which can mitigate the over-training effects by marginalizing the model parameters of an output distribution. We address the sparse data problem in speech recognition, and show experimentally that VB-BPC is robust against data sparseness.

  • Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement

    Tran Huy DAT  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1040-1049

    This study shows the effectiveness of using gamma distribution in the speech power domain as a more general prior distribution for the model-based speech enhancement approaches. This model is a super-set of the conventional Gaussian model of the complex spectrum and provides more accurate prior modeling when the optimal parameters are estimated. We develop a method to adapt the modeled distribution parameters from each actual noisy speech in a frame-by-frame manner. Next, we derive and investigate the minimum mean square error (MMSE) and maximum a posterior probability (MAP) estimations in different domains of speech spectral magnitude, generalized power and its logarithm, using the proposed gamma modeling. Finally, a comparative evaluation of the MAP and MMSE filters is conducted. As the MMSE estimations tend to more complicated using more general prior distributions, the MAP estimations are given in closed-form extractions and therefore are suitable in the implementation. The adaptive estimation of the modeled distribution parameters provides more accurate prior modeling and this is the principal merit of the proposed method and the reason for the better performance. From the experiments, the MAP estimation is recommended due to its high efficiency and low complexity. Among the MAP based systems, the estimation in log-magnitude domain is shown to be the best for the speech recognition as the estimation in power domain is superior for the noise reduction.

  • A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

    Makoto TACHIBANA  Junichi YAMAGISHI  Takashi MASUKO  Takao KOBAYASHI  

     
    PAPER-Speech Synthesis

      Vol:
    E89-D No:3
      Page(s):
    1092-1099

    This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.

  • Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

    Takashi SAITO  

     
    PAPER-Speech Analysis

      Vol:
    E89-D No:3
      Page(s):
    1100-1106

    This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.

  • Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System

    Sang-Jin KIM  Jong-Jin KIM  Minsoo HAHN  

     
    LETTER

      Vol:
    E89-D No:3
      Page(s):
    1116-1119

    Development of a hidden Markov model (HMM)-based Korean speech synthesis system and its evaluation is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about phoneme, morpheme, word phrase, utterance, and break strength. The developed system produced speech with a fairly good prosody. The synthesized speech is evaluated and compared with that of our corpus-based unit concatenating Korean text-to-speech system. The two systems were trained with the same manually labeled speech database.

  • Production-Oriented Models for Speech Recognition

    Erik MCDERMOTT  Atsushi NAKAMURA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    1006-1014

    Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation thereof; state-to-state evolution is modeled only crudely, with no explicit relationship between states, such as would be afforded by the use of phonetic features commonly used by linguists to describe speech phenomena, or by the continuity and smoothness of the production parameters governing speech. This survey article attempts to provide an overview of proposals by several researchers for improving acoustic modeling in these regards. Such topics as the controversial Motor Theory of Speech Perception, work by Hogden explicitly using a continuity constraint in a pseudo-articulatory domain, the Kalman filter based Hidden Dynamic Model, and work by many groups showing the benefits of using articulatory features instead of phones as the underlying units of speech, will be covered.

  • Lower MAC Software Implementations for the IEEE 802.16 Standard

    Ioannis PAPAIOANNOU  Chrissavgi DRE  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E89-B No:3
      Page(s):
    816-827

    In this paper the development of the control plane for the frame decoding functionality of an IEEE 802.16 Wireless MAN system is described. It is implemented in two ways. The first implementation is based on a general-purpose microprocessor, and specifically the one provided in the TMS320C64xx Texas family devices. The second implementation is based on an Intel's IXP2400 Network Processor chip and the preceding functions are implemented by writing embedded software for that part. The two implementations are compared and the comparison leads to some very useful results. The development of time critical tasks of a MAC protocol stack in software and mainly based on a Network Processor opens paths for very effective system architectures, where the Network Processor runs full the networking and the MAC/DLC processing of such telecom systems. The main question is: Can lower MAC be executed on a Network Processor or not? This manuscript attempts to give an answer to this question.

  • Error Identification in At-Speed Scan BIST Environment in the Presence of Circuit and Tester Speed Mismatch

    Yoshiyuki NAKAMURA  Thomas CLOUQUEUR  Kewal K. SALUJA  Hideo FUJIWARA  

     
    PAPER-Dependable Computing

      Vol:
    E89-D No:3
      Page(s):
    1165-1172

    In this paper, we provide a practical formulation of the problem of identifying all error occurrences and all failed scan cells in at-speed scan based BIST environment. We propose a method that can be used to identify every error when the circuit test frequency is higher than the tester frequency. Our approach requires very little extra hardware for diagnosis and the test application time required to identify errors is a linear function of the frequency ratio between the CUT and the tester.

  • What HMMs Can Do

    Jeff A. BILMES  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    869-891

    Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems--today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial article analyzes HMMs by exploring a definition of HMMs in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM (say for ASR), rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.

  • Expressive Power of Quantum Pushdown Automata with Classical Stack Operations under the Perfect-Soundness Condition

    Masaki NAKANISHI  Kiyoharu HAMAGUCHI  Toshinobu KASHIWABARA  

     
    PAPER-Computation and Computational Models

      Vol:
    E89-D No:3
      Page(s):
    1120-1127

    One important question for quantum computing is whether a computational gap exists between models that are allowed to use quantum effects and models that are not. Several types of quantum computation models have been proposed, including quantum finite automata and quantum pushdown automata (with a quantum pushdown stack). It has been shown that some quantum computation models are more powerful than their classical counterparts and others are not since quantum computation models are required to obey such restrictions as reversible state transitions. In this paper, we investigate the power of quantum pushdown automata whose stacks are assumed to be implemented as classical devices, and show that they are strictly more powerful than their classical counterparts under the perfect-soundness condition, where perfect-soundness means that an automaton never accepts a word that is not in the language. That is, we show that our model can simulate any probabilistic pushdown automata and also show that there is a non-context-free language which quantum pushdown automata with classical stack operations can recognize with perfect soundness.

  • Robust Speech Recognition by Using Compensated Acoustic Scores

    Shoei SATO  Kazuo ONOE  Akio KOBAYASHI  Toru IMAI  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    915-921

    This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.

2501-2520hit(4073hit)