IEICE global.ieice.org Site

Keyword Search Result

[Keyword] EE(4073hit)

2501-2520hit(4073hit)

Substring Count Estimation in Extremely Long Strings
Jinuk BAE Sukho LEE

PAPER-Database

Vol:
E89-D No:3
Page(s):
1148-1156
To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees, because their origin, the suffix tree, has memory-bottleneck problem with long strings. Secondly, some of CS-tree-node counts are incorrect due to frequent pruning of nodes. Therefore, we propose the count q-gram tree (CQ-tree) as an alphanumeric histogram for long strings. By adopting q-grams (or length-q substrings), CQ-trees can be created fast and correctly within small available memory. Furthermore, we mathematically provide the lower and upper bounds that the count estimation can reach to. To the best of our knowledge, our work is the first one to present such bounds among research activities to estimate the alphanumeric selectivity. Our experimental study shows that the CQ-tree outperforms the CS-tree in terms of the building time and accuracy.
Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition
William BYRNE

INVITED PAPER

Vol:
E89-D No:3
Page(s):
900-907
Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going 'beyond HMMs', showing in particular that this process of subproblem identification makes it possible to train and apply small-domain binary pattern classifiers, such as Support Vector Machines, to large vocabulary continuous speech recognition.
Circuits for CMOS High-Speed I/O in Sub-100 nm Technologies
Hirotaka TAMURA Masaya KIBUNE Hisakatsu YAMAGUCHI Kouichi KANDA Kohtaroh GOTOH Hideki ISHIDA Junji OGAWA

INVITED PAPER

Vol:
E89-C No:3
Page(s):
300-313
The paper provides an overview of the circuit techniques for CMOS high-speed I/Os, focusing on the design issues in sub-100 nm standard CMOS. First, we describe the evolution of CMOS high-speed I/O since it appeared in mid 90's. In our view, the surge in the I/O bandwidth we experienced from the mid 90's to the present was driven by the continuous improvement of the CMOS IC performance. As a result, CMOS high-speed I/O has covered the data rate ranging from 2.5 Gb/s to 10 Gb/s, and now is heading for 40 Gb/s and beyond. To meet the speed requirements, an optimum choice of the transceiver architecture and its building blocks are crucial. We pick the most critical building blocks such as the decision circuit and the multiplexors and give detailed explanation of their designs. We describe the low-voltage operation of the high-speed I/O in view of reducing the power consumption. An example of a 90-nm CMOS 2.5 Gb/s transceiver operating off a 0.8 V power supply will be described. Operability at 0.8 V ensures that the circuits will not become obsolescent, even below the 60 nm process node.
Training Augmented Models Using SVMs
Mark J.F. GALES Martin I. LAYTON

INVITED PAPER

Vol:
E89-D No:3
Page(s):
892-899
There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here, a local exponential approximation is made about some point on a base model. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task.
A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging
Masakiyo FUJIMOTO Satoshi NAKAMURA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
922-930
This paper addresses a speech recognition problem in non-stationary noise environments: the estimation of noise sequences. To solve this problem, we present a particle filter-based sequential noise estimation method for front-end processing of speech recognition in noise. In the proposed method, a noise sequence is estimated in three stages: a sequential importance sampling step, a residual resampling step, and finally a Markov chain Monte Carlo step with Metropolis-Hastings sampling. The estimated noise sequence is used in the MMSE-based clean speech estimation. We also introduce Polyak averaging and feedback into a state transition process for particle filtering. In the evaluation results, we observed that the proposed method improves speech recognition accuracy in the results of non-stationary noise environments a noise compensation method with stationary noise assumptions.
Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion
Tetsuji OGAWA Tetsunori KOBAYASHI

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
939-945
A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM, we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs from category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.
A Fast Fractal Image Compression Algorithm Based on Average-Variance Function
ChenGuang ZHOU Kui MENG ZuLian QIU

LETTER-Image Processing and Video Processing

Vol:
E89-D No:3
Page(s):
1303-1308
In order to improve the efficiency and speed of match seeking in fractal compression, this paper presents an Average-Variance function which can make the optimal choice more efficiently. Based on it, we also present a fast optimal choice fractal image compression algorithm and an optimal method of constructing data tree which greatly improve the performances of the algorithm. Analysis and experimental results proved that it can improve PSNR over 1 dB and improve the coding speed over 30-40% than ordinary optimal choice algorithms such as algorithm based on center of gravity and algorithm based on variance. It can offer much higher optimal choice efficiency, higher reconstructive quality and rapid speed. It's a fast fractal encoding algorithm with high performances.
Channel-Count-Independent BIST for Multi-Channel SerDes
Kouichi YAMAGUCHI Muneo FUKAISHI

PAPER-Interface and Interconnect Techniques

Vol:
E89-C No:3
Page(s):
314-319
This paper describes a BIST circuit for testing SoC integrated multi-channel serializer/deserializer (SerDes) macros. A newly developed packet-based PRBS generator enables the BIST to perform at-speed testing of asynchronous data transfers. In addition, a new technique for chained alignment checks between adjacent channels helps achieve a channel-count-independent architecture for verification of multi-channel alignment between SerDes macros. Fabricated in a 0.13-µm CMOS process and operating at > 500 MHz, the BIST has successfully verified all SerDes functions in at-speed testing of 5-Gbps20-ch SerDes macros.
Quantum Noise and Feed-Back Noise in Blue-Violet InGaN Semiconductor Lasers
Kenjiro MATSUOKA Kazushi SAEKI Eiji TERAOKA Minoru YAMADA Yuji KUWAMURA

LETTER-Lasers, Quantum Electronics

Vol:
E89-C No:3
Page(s):
437-439
Properties of the quantum noise and the optical feedback noise in blue-violet InGaN semiconductor lasers were measured in detail. We confirmed that the quantum noise in the blue-violet laser becomes higher than that in the near-infrared laser. This property is an intrinsic property basing on principle of the quantum mechanics, and is severe subject to apply the laser for optical disk with the small consuming power. The feedback noise was classified into two types of "low frequency type" and "flat type" basing on frequency spectrum of the noise. This classification was the same as that in the near infra-red lasers.
Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework
Shinji WATANABE Atsushi NAKAMURA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
970-980
We introduce a robust classification method based on the Bayesian predictive distribution (Bayesian Predictive Classification, referred to as BPC) for speech recognition. We and others have recently proposed a total Bayesian framework named Variational Bayesian Estimation and Clustering for speech recognition (VBEC). VBEC includes the practical computation of approximate posterior distributions that are essential for BPC, based on variational Bayes (VB). BPC using VB posterior distributions (VB-BPC) provides an analytical solution for the predictive distribution as the Student's t-distribution, which can mitigate the over-training effects by marginalizing the model parameters of an output distribution. We address the sparse data problem in speech recognition, and show experimentally that VB-BPC is robust against data sparseness.
Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement
Tran Huy DAT Kazuya TAKEDA Fumitada ITAKURA

PAPER-Speech Enhancement

Vol:
E89-D No:3
Page(s):
1040-1049
This study shows the effectiveness of using gamma distribution in the speech power domain as a more general prior distribution for the model-based speech enhancement approaches. This model is a super-set of the conventional Gaussian model of the complex spectrum and provides more accurate prior modeling when the optimal parameters are estimated. We develop a method to adapt the modeled distribution parameters from each actual noisy speech in a frame-by-frame manner. Next, we derive and investigate the minimum mean square error (MMSE) and maximum a posterior probability (MAP) estimations in different domains of speech spectral magnitude, generalized power and its logarithm, using the proposed gamma modeling. Finally, a comparative evaluation of the MAP and MMSE filters is conducted. As the MMSE estimations tend to more complicated using more general prior distributions, the MAP estimations are given in closed-form extractions and therefore are suitable in the implementation. The adaptive estimation of the modeled distribution parameters provides more accurate prior modeling and this is the principal merit of the proposed method and the reason for the better performance. From the experiments, the MAP estimation is recommended due to its high efficiency and low complexity. Among the MAP based systems, the estimation in log-magnitude domain is shown to be the best for the speech recognition as the estimation in power domain is superior for the noise reduction.
A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features
Makoto TACHIBANA Junichi YAMAGISHI Takashi MASUKO Takao KOBAYASHI

PAPER-Speech Synthesis

Vol:
E89-D No:3
Page(s):
1092-1099
This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.
Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes
Takashi SAITO

PAPER-Speech Analysis

Vol:
E89-D No:3
Page(s):
1100-1106
This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.
Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System
Sang-Jin KIM Jong-Jin KIM Minsoo HAHN

LETTER

Vol:
E89-D No:3
Page(s):
1116-1119
Development of a hidden Markov model (HMM)-based Korean speech synthesis system and its evaluation is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about phoneme, morpheme, word phrase, utterance, and break strength. The developed system produced speech with a fairly good prosody. The synthesized speech is evaluated and compared with that of our corpus-based unit concatenating Korean text-to-speech system. The two systems were trained with the same manually labeled speech database.
Production-Oriented Models for Speech Recognition
Erik MCDERMOTT Atsushi NAKAMURA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
1006-1014
Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation thereof; state-to-state evolution is modeled only crudely, with no explicit relationship between states, such as would be afforded by the use of phonetic features commonly used by linguists to describe speech phenomena, or by the continuity and smoothness of the production parameters governing speech. This survey article attempts to provide an overview of proposals by several researchers for improving acoustic modeling in these regards. Such topics as the controversial Motor Theory of Speech Perception, work by Hogden explicitly using a continuity constraint in a pseudo-articulatory domain, the Kalman filter based Hidden Dynamic Model, and work by many groups showing the benefits of using articulatory features instead of phones as the underlying units of speech, will be covered.
Lower MAC Software Implementations for the IEEE 802.16 Standard
Ioannis PAPAIOANNOU Chrissavgi DRE

PAPER-Wireless Communication Technologies

Vol:
E89-B No:3
Page(s):
816-827
In this paper the development of the control plane for the frame decoding functionality of an IEEE 802.16 Wireless MAN system is described. It is implemented in two ways. The first implementation is based on a general-purpose microprocessor, and specifically the one provided in the TMS320C64xx Texas family devices. The second implementation is based on an Intel's IXP2400 Network Processor chip and the preceding functions are implemented by writing embedded software for that part. The two implementations are compared and the comparison leads to some very useful results. The development of time critical tasks of a MAC protocol stack in software and mainly based on a Network Processor opens paths for very effective system architectures, where the Network Processor runs full the networking and the MAC/DLC processing of such telecom systems. The main question is: Can lower MAC be executed on a Network Processor or not? This manuscript attempts to give an answer to this question.
Error Identification in At-Speed Scan BIST Environment in the Presence of Circuit and Tester Speed Mismatch
Yoshiyuki NAKAMURA Thomas CLOUQUEUR Kewal K. SALUJA Hideo FUJIWARA

PAPER-Dependable Computing

Vol:
E89-D No:3
Page(s):
1165-1172
In this paper, we provide a practical formulation of the problem of identifying all error occurrences and all failed scan cells in at-speed scan based BIST environment. We propose a method that can be used to identify every error when the circuit test frequency is higher than the tester frequency. Our approach requires very little extra hardware for diagnosis and the test application time required to identify errors is a linear function of the frequency ratio between the CUT and the tester.
What HMMs Can Do
Jeff A. BILMES

INVITED PAPER

Vol:
E89-D No:3
Page(s):
869-891
Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems--today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial article analyzes HMMs by exploring a definition of HMMs in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM (say for ASR), rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.
Expressive Power of Quantum Pushdown Automata with Classical Stack Operations under the Perfect-Soundness Condition
Masaki NAKANISHI Kiyoharu HAMAGUCHI Toshinobu KASHIWABARA

PAPER-Computation and Computational Models

Vol:
E89-D No:3
Page(s):
1120-1127
One important question for quantum computing is whether a computational gap exists between models that are allowed to use quantum effects and models that are not. Several types of quantum computation models have been proposed, including quantum finite automata and quantum pushdown automata (with a quantum pushdown stack). It has been shown that some quantum computation models are more powerful than their classical counterparts and others are not since quantum computation models are required to obey such restrictions as reversible state transitions. In this paper, we investigate the power of quantum pushdown automata whose stacks are assumed to be implemented as classical devices, and show that they are strictly more powerful than their classical counterparts under the perfect-soundness condition, where perfect-soundness means that an automaton never accepts a word that is not in the language. That is, we show that our model can simulate any probabilistic pushdown automata and also show that there is a non-context-free language which quantum pushdown automata with classical stack operations can recognize with perfect soundness.
Robust Speech Recognition by Using Compensated Acoustic Scores
Shoei SATO Kazuo ONOE Akio KOBAYASHI Toru IMAI

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
915-921
This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.

2501-2520hit(4073hit)

Keyword Search Result

[Keyword] EE(4073hit)

Substring Count Estimation in Extremely Long Strings

Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

Circuits for CMOS High-Speed I/O in Sub-100 nm Technologies

Training Augmented Models Using SVMs

A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging

Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

A Fast Fractal Image Compression Algorithm Based on Average-Variance Function

Channel-Count-Independent BIST for Multi-Channel SerDes

Quantum Noise and Feed-Back Noise in Blue-Violet InGaN Semiconductor Lasers

Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework

Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement

A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System

Production-Oriented Models for Speech Recognition

Lower MAC Software Implementations for the IEEE 802.16 Standard

Error Identification in At-Speed Scan BIST Environment in the Presence of Circuit and Tester Speed Mismatch

What HMMs Can Do

Expressive Power of Quantum Pushdown Automata with Classical Stack Operations under the Perfect-Soundness Condition

Robust Speech Recognition by Using Compensated Acoustic Scores

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles