IEICE global.ieice.org Site

Keyword Search Result

[Keyword] EE(4073hit)

2161-2180hit(4073hit)

Speaker Verification in Realistic Noisy Environment in Forensic Science
Toshiaki KAMADA Nobuaki MINEMATSU Takashi OSANAI Hisanori MAKINAE Masumi TANIMOTO

PAPER-Speaker Verification

Vol:
E91-D No:3
Page(s):
558-566
In forensic voice telephony speaker verification, we may be requested to identify a speaker in a very noisy environment, unlike the conditions in general research. In a noisy environment, we process speech first by clarifying it. However, the previous study of speaker verification from clarified speech did not yield satisfactory results. In this study, we experimented on speaker verification with clarification of speech in a noisy environment, and we examined the relationship between improving acoustic quality and speaker verification results. Moreover, experiments with realistic noise such as a crime prevention alarm and power supply noise was conducted, and speaker verification accuracy in a realistic environment was examined. We confirmed the validity of speaker verification with clarification of speech in a realistic noisy environment.
Signal Processing Techniques for Robust Speech Recognition
Futoshi ASANO

INVITED PAPER

Vol:
E91-D No:3
Page(s):
393-401
In this paper, signal processing techniques which can be applied to automatic speech recognition to improve its robustness are reviewed. The choice of signal processing techniques is strongly dependent on the scenario of the applications. The analysis of scenario and the choice of suitable signal processing techniques are shown through two examples.
Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR
Shoei SATO Akio KOBAYASHI Kazuo ONOE Shinichi HOMMA Toru IMAI Tohru TAKAGI Tetsunori KOBAYASHI

PAPER-Speech and Hearing

Vol:
E91-D No:3
Page(s):
815-824
We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights from the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a search space without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.
An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus
Jin-Song ZHANG Satoshi NAKAMURA

PAPER-Corpus

Vol:
E91-D No:3
Page(s):
615-630
An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.
Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition
Makoto SAKAI Norihide KITAOKA Seiichi NAKAGAWA

PAPER-Feature Extraction

Vol:
E91-D No:3
Page(s):
478-487
To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e.g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.
Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation
Yotaro KUBO Shigeki OKAWA Akira KUREMATSU Katsuhiko SHIRAI

PAPER-ASR under Reverberant Conditions

Vol:
E91-D No:3
Page(s):
448-456
We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect. 2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect. 3. The proposed system is described in Sect. 4. The experimental setup is presented in Sect. 5, and the results are discussed in Sect. 6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.
Intermediate-Hop Preemption to Improve Fairness in Optical Burst Switching Networks
Masayuki UEDA Takuji TACHIBANA Shoji KASAHARA

PAPER-Switching for Communications

Vol:
E91-B No:3
Page(s):
710-721
In optical burst switching (OBS) networks, burst with different numbers of hops experience unfairness in terms of the burst loss probability. In this paper, we propose a preemptive scheme based on the number of transit hops in OBS networks. In our proposed scheme, preemption is performed with two thresholds; one is for the total number of hops of a burst and the other is for the number of transit hops the burst has passed through. We evaluate the performance of the scheme by simulation, and numerical examples show that the proposed scheme improves the fairness among the bursts with different numbers of hops, keeping the overall burst loss probability the same as that for the conventional OBS transmission without preemption.
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
Longbiao WANG Seiichi NAKAGAWA Norihide KITAOKA

PAPER-ASR under Reverberant Conditions

Vol:
E91-D No:3
Page(s):
457-466
In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.
Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval
Qingqing ZHANG Jielin PAN Yang LIN Jian SHAO Yonghong YAN

PAPER-Acoustic Modeling

Vol:
E91-D No:3
Page(s):
514-521
In recent decades, there has been a great deal of research into the problem of bilingual speech recognition - to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.
Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory
Seong-Ook JUNG Sei-Seung YOON

LETTER-VLSI Design Technology and CAD

Vol:
E91-A No:3
Page(s):
895-898
This letter presents a race-free mixed serial-parallel comparison (RFMSPC) scheme which uses both serial and parallel CAMs in a match line. A self-reset search line scheme for the serial CAM is proposed to avoid the timing race problem and additional timing penalties. Various 32 entry CAMs are designed using 90 nm 1.2 V CMOS process to verify the proposed RFMSPC scheme. It shows that the RFMSPC saves power consumption by 40%, 53% and 63% at the cost of a 4%, 6% and 16% increase in search time according to 1, 2, and 4 serial CAM bits in a match line.
Designing Algebraic Trellis Code as a New Fixed Codebook Module for ACELP Coder
Jakyong JUN Sangwon KANG Thomas R. FISCHER

LETTER-Multimedia Systems for Communications

Vol:
E91-B No:3
Page(s):
972-974
In this paper, a block-constrained trellis coded quantization (BC-TCQ) algorithm is combined with an algebraic codebook to produce an algebraic trellis code (ATC) to be used in ACELP coding. In ATC, the set of allowed algebraic codebook pulse positions is expanded, and the expanded set is partitioned into subsets of pulse positions; the trellis branches are labeled with these subsets. The list Viterbi algorithm (LVA) is used to select the excitation codevector. The combination of an ATC codebook and LVA trellis search algorithm is denoted as an ATC-LVA block code. The ATC-LVA block code is used as the fixed codebook of the AMR-WB 8.85 kbps mode, reducing complexity compared to the conventional algebraic codebook.
Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models
Takatoshi JITSUHIRO Tomoji TORIYAMA Kiyoshi KOGURE

PAPER-Noisy Speech Recognition

Vol:
E91-D No:3
Page(s):
402-410
We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.
A Design of the Signal Processing Hardware Platform for Communication Systems
Byung Wook LEE Sung Ho CHO

LETTER-Wireless Communication Technologies

Vol:
E91-B No:3
Page(s):
939-942
In this letter, an efficient hardware platform for the digital signal processing for OFDM communication systems is presented. The hardware platform consists of a single FPGA having 900 K gates, two DSPs with maximum 8,000 MIPS at 1 GHz clock, 2-channel ADC and DAC supporting maximum 125 MHz sampling rate, and flexible data bus architecture, so that a wide variety of baseband signal processing algorithms for practical OFDM communication systems may be implemented and tested. The IEEE 802.16d software modem is also presented in order to verify the effectiveness and usefulness of the designed platform.
Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors
Mohammad NURUL HUDA Muhammad GHULAM Takashi FUKUDA Kouichi KATSURADA Tsuneo NITTA

PAPER-Feature Extraction

Vol:
E91-D No:3
Page(s):
488-498
This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.
Experimental Evaluation of the Super Sweep Spectrum Analyzer
Masao NAGANO Toshio ONODERA Mototaka SONE

PAPER-Digital Signal Processing

Vol:
E91-A No:3
Page(s):
782-790
A sweep spectrum analyzer has been improved over the years, but the fundamental method has not been changed before the 'Super Sweep' method appeared. The 'Super Sweep' method has been expected to break the limitation of the conventional sweep spectrum analyzer, a limit of the maximum sweep rate which is in inverse proportion to the square of the frequency resolution. The superior performance of the 'Super Sweep' method, however, has not been experimentally proved yet. This paper gives the experimental evaluation on the 'Super Sweep' spectrum analyzer, of which theoretical concepts have already been presented by the authors of this paper. Before giving the experimental results, we give complete analysis for a sweep spectrum analyzer and express the principle of the super-sweep operation with a complete set of equations. We developed an experimental system whose components operated in an optimum condition as the spectrum analyzer. Then we investigated its properties, a peak level reduction and broadening of the frequency resolution of the measured spectrum, by changing the sweep rate. We also confirmed that the experimental system satisfactorily detected the spectrum at least 30 times faster than the conventional method and the sweep rate was in proportion to the bandwidth of the base band signal to be analyzed. We proved that the 'Super Sweep' method broke the restriction of the sweep rate put on a conventional sweep spectrum analyzer.
An Effective Load Balancing Scheme for 3D Texture-Based Sort-Last Parallel Volume Rendering on GPU Clusters
Won-Jong LEE Vason P. SRINI Woo-Chan PARK Shigeru MURAKI Tack-Don HAN

PAPER-Computer Graphics

Vol:
E91-D No:3
Page(s):
846-856
We present an adaptive dynamic load balancing scheme for 3D texture based sort-last parallel volume rendering on a PC cluster equipped with GPUs. Our scheme exploits not only task parallelism but also data parallelism during rendering by combining the hierarchical data structures (octree and parallel BSP tree) in order to skip empty regions and distribute proper workloads to rendering nodes. Our scheme can also conduct a valid parallel rendering and image compositing in visibility order by employing a 3D clustering algorithm. To alleviate the imbalance when the transfer function is changed, a load rebalancing is inexpensively supported by exchanging only needed data. A detailed performance analysis is provided and scaling characteristics of our scheme are discussed. These show that our scheme can achieve significant performance gains by increasing parallelism and decreasing synchronizing costs compared to the traditional static distribution schemes.
A Feasibility Study of Fuzzy FES Controller Based on Cycle-to-Cycle Control: An Experimental Test of Knee Extension Control
Takashi WATANABE Tomoya MASUKO Achmad ARIFIN Makoto YOSHIZAWA

LETTER-Rehabilitation Engineering and Assistive Technology

Vol:
E91-D No:3
Page(s):
865-868
Functional Electrical Stimulation (FES) can be effective in assisting or restoring paralyzed motor functions. The purpose of this study is to examine experimentally the fuzzy controller based on cycle-to-cycle control for FES-induced gait. A basic experimental test was performed on controlling maximum knee extension angle with normal subjects. In most of control trials, the joint angle was controlled well compensating changes in muscle responses to electrical stimulation. The results show that the fuzzy controller would be practical in clinical applications of gait control by FES. An automatic parameter tuning would be required practically for quick responses in reaching the target and in compensating the change in muscle responses without causing oscillating responses.
On the Generative Power of Multiple Context-Free Grammars and Macro Grammars
Hiroyuki SEKI Yuki KATO

PAPER-Formal Language Theory

Vol:
E91-D No:2
Page(s):
209-221
Several grammars of which generative power is between context-free grammar and context-sensitive grammar were proposed. Among them are macro grammar and tree adjoining grammar. Multiple context-free grammar is also a natural extension of context-free grammars, and is known to be stronger in its generative power than tree adjoining grammar and yet to be recognizable in polynomial time. In this paper, the generative power of several subclasses of variable-linear macro grammars and that of multiple context-free grammars are compared in details.
Design and Experiments of a Novel Low-Ripple Cockcroft-Walton AC-to-DC Converter for a Coil-Coupled Passive RFID Tag
Toshitaka YAMAKAWA Takahiro INOUE Akio TSUNEDA

PAPER

Vol:
E91-A No:2
Page(s):
513-520
A low-ripple diode charge-pump type AC-DC converter based on the Cockcroft-Walton diode multiplier is proposed for coil-coupled passive IC tags in this paper. This circuit is developed as a power supply for passive RFID tags with smart functions such as heart rate detection and/or body temperature measurement. The proposed circuit converts wirelessly induced power to a low-ripple DC voltage suitable for a 13.56 MHz RFID tag. The proposed circuit topology and the principle of operation are explained and treated theoretically by using quasi-equivalent small-signal models. The proposed circuit was implemented on a PCB. And it was confirmed that the proposed circuit provides 3.3 V DC with a ripple of less than 20 mV when a 4 Vp-p sinusoidal input is applied. Under this condition, the maximum output power is about 310 µW. The measured results were in good agreement with theoretical and HSPICE simulation results.
Achieving Weighted Fairness and Efficient Channel Utilization in IEEE 802.11e WLANs
Wei ZHANG Jun SUN Xinbing WANG

LETTER-Wireless Communication Technologies

Vol:
E91-B No:2
Page(s):
653-657
This paper addresses the problem of maximizing the protocol capacity of 802.11e networks, under the assumption that each access category (AC) has the same packet length. We prove that the maximal protocol capacity can be achieved at an optimal operating point with the medium idle probability of , where Tc* is the duration of collision time in terms of slot unit. Our results indicate that the optimal operating point is independent of the number of stations and throughput ratio among ACs, which means the proposed analytical results still hold even when throughput ratio and station number are time-varying. Further, we show that the maximal protocol capacity can be achieved in saturated cases by properly choosing the protocol parameters. We present a parameter configuration algorithm to achieve both efficient channel utilization and proportional fairness in IEEE 802.11e EDCA networks. Extensive simulation and analytical results are presented to verify the proposed ideas.

2161-2180hit(4073hit)

Keyword Search Result

[Keyword] EE(4073hit)

Speaker Verification in Realistic Noisy Environment in Forensic Science

Signal Processing Techniques for Robust Speech Recognition

Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition

Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Intermediate-Hop Preemption to Improve Fairness in Optical Burst Switching Networks

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

Race-Free Mixed Serial-Parallel Comparison for Low Power Content Addressable Memory

Designing Algebraic Trellis Code as a New Fixed Codebook Module for ACELP Coder

Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

A Design of the Signal Processing Hardware Platform for Communication Systems

Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors

Experimental Evaluation of the Super Sweep Spectrum Analyzer

An Effective Load Balancing Scheme for 3D Texture-Based Sort-Last Parallel Volume Rendering on GPU Clusters

A Feasibility Study of Fuzzy FES Controller Based on Cycle-to-Cycle Control: An Experimental Test of Knee Extension Control

On the Generative Power of Multiple Context-Free Grammars and Macro Grammars

Design and Experiments of a Novel Low-Ripple Cockcroft-Walton AC-to-DC Converter for a Coil-Coupled Passive RFID Tag

Achieving Weighted Fairness and Efficient Channel Utilization in IEEE 802.11e WLANs

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles